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Detrended fluctuation analysis (DFA) is an improved metliod of classical fluctuation analysis for nonstation- 
ary signals where embedded polynomial trends mask the intrinsic correlation properties of the fluctuations. To 
better identify the intrinsic correlation properties of real-world signals where a large amount of data is missing 
or removed due to artifacts, we investigate how extreme data loss affects the scaling behavior of long-range 
power-law correlated and anti-correlated signals. We introduce a new segmentation approach to generate sur- 
rogate signals by randomly removing data segments from stationary signals with different types of long-range 
correlations. The surrogate signals we generate are characterized by four parameters: (i) the DFA scaling expo- 
nent a of the original correlated signal u{i), (ii) the percentage p of the data removed from u{i), (iii) the average 
length of the removed (or remaining) data segments, and (iv) the functional form P{1) of the distribution of 
the length I of the removed (or remaining) data segments. We find that the global scaling exponent of positively 
correlated signals remains practically unchanged even for extreme data loss of up to 90%. In contrast, the global 
scaling of anti-correlated signals changes to uncorrelated behavior even when a very small fraction of the data 
is lost. These observations are confirmed on two examples of real-world signals: human gait and commodity 
price fluctuations. We further systematically study the local scaling behavior of surrogate signals with missing 
data to reveal subtle deviations across scales. We find that for anti-conelated signals even 10% of data loss leads 
to significant monotonic deviations in the local scaling at large scales from the original anti-correlated towards 
uncorrelated behavior. In contrast, positively correlated signals show no observable changes in the local scaling 
for up to 65% of data loss, while for larger percentage of data loss, the local scaling shows overestimated regions 
(with higher local exponent) at small scales, followed by underestimated regions (with lower local exponent) at 
large scales. Finally, we investigate how the scaling is affected by the average length, probability distribution 
and percentage of the remaining data segments in comparison to the removed segments. We find that the aver- 
age length fir of the remaining segments is the key parameter which determines the scales at which the local 
scaling exponent has a maximum deviation from its original value. Interestingly, the scales where the maximum 
deviation occurs follow a power-law relationship with /i,.. Whereas the percentage of data loss determines the 
extent of the deviation. The results presented in this paper are useful to correctly interpret the scaling properties 
obtained from signals with extreme data loss. 

PACS numbers: 



I. INTRODUCTION 



In real-world signals data can be missing or unavailable to 
a very large extent, especially in archaeological, geological 
and physiological recordings which often once recorded in the 
past can not be generated again. Knowing the effects which 
data loss may have on the correlations and other dynamical 
properties of the output signals of a given system is instru- 
mental in accurately quantifying and modeling the underlying 
mechanisms driving the dynamics of the system. Significant 
data loss can also be caused by failure of the data collection 
equipment, as well as by the removal of artifacts or noise- 
contaminated data segments. To correctly interpret results ob- 
tained from correlated signals with missing data, it is impor- 
tant to understand how the dynamical properties of such sig- 
nals are affected by the degree of data loss. Here we systemat- 
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ically investigate how loss of data changes the scaling proper- 
ties of various long-range power-law anti-correlated and pos- 
itively correlated signals. Specifically, we develop a segmen- 
tation approach to generate surrogate signals by randomly re- 
moving data segments from stationary long-range power-law 
correlated signals, and we study how the correlation properties 
are affected by (i) the percentage of removed data, (ii) the av- 
erage length of the removed (or remaining) data segments and 
(iii) the functional form of the probability distribution of the 
removed (remaining) segments. We utilize the detrended fluc- 
tuation analysis (DFA) to quantify the effect of extreme data 
loss on the scaling properties of long-range correlated signals. 

Scaling (fractal) behavior was first encountered in a class 
of physical systems IH-llt] which for a given "critical" value 
of their parameters, exhibit complex organization among their 
individual components, leading to correlated interactions over 
a broad range of scales. This class of complex systems are typ- 
ically characterized by (i) multi-component nonlinear feed- 
back interactions, (ii) non-equilibrium output dynamics, and 
(iii) high susceptibility and responsiveness to perturbations. 
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Scaling behavior has been found in a diverse group of sys- 
tems — ranging from earthquakes, to traffic jams and eco- 
nomic crashes, to neuronal excitations as well as the dynamics 
of integrated physiologic systems under neural control — and 
has been associated with the underlying mechanisms of regu- 
lation of these systems flS]. The output signals of such sys- 
tems exhibit continuous fluctuations over multiple time and/or 
space scales where the amplitudes and temporal/spatial 

organization of the fluctuations are characterized by absence 
of dominant scale, i.e., scale-invariant behavior. Due to the 
nonlinear mechanisms controlling the underlying interactions, 
the output signals of these systems are also typically non- 
stationary, which masks the intrinsic correlations. Traditional 
methods such as power-spectrum and auto-correlation analy- 
sis ifsl- fioll are not suitable for nonstationary signals. 

DFA is a robust method suitable for detecting long- 
range power-law correlations embedded in nonstationary sig- 
nals iflTl [12I1 . It has been successfully applied to a vari- 
ety of fields where scale-invariant behavior emerges, such as 
cardiac dynamics J|27H46t], human locomo- 
tion ijsTEjSSL circadian rhythm l50l 453 |]. neural receptors 
in biological systems ifsill . seismology I 55l l56ll . meteorol- 



ogy iIstIi . chmate temperature fluctuations I, river flow 

and discharge ll64l l65ll . and economics l66l479ll . The DFA 
method may also help identify different states of the same 
system exhibiting different scaling behavior — e.g., the DFA 
scaling exponent a for heart-beat intervals is si gnifi cantly dif- 
ferent for healthy and sick individuals] 32il44ll as well as 
for wake and sleep states jsojlsllioilislHll 

Elucidating the intrinsic mechanisms of a given system re- 
quires an accurate analysis and proper interpretation of the 
dynamical (scaling) properties of its output signals. It is of- 
ten the case that the scaling exponent quantifying the temporal 
(spatial) organization of the systems' dynamics across scales 
is not always the same, but depends on the scale of observa- 
tion, leading to distinct crossovers — i.e., the value of the scal- 
ing exponent may be different for smaller compared to larger 
scales. Such behavior has been observed for diverse sys- 
tems, for example: (i) the spontaneous motion of microbeads 
bound to the cytoskeleton of hving cells as quantified by the 
mean-square displacement does not exhibit a Brownian mo- 
tion but instead undergoes a transition from subdiffusive to 
superdiffusive behavior with time Isoll : (ii) cardiac dynamics 
of healthy subjects during sleep are characterized by fluctua- 
tions in the heartbeat intervals exhibiting a crossover from a 
higher scaling exponent (stronger correlations) at small time 
scales (from seconds up to a minute) to a lower scaling ex- 
ponent (weaker correlations) at large time scales (from min- 
utes to hours), associated with changes in neural autonomic 
control during sleep Isol [sill ; and (iii) stock market dynam- 
ics where both absolute price returns and intertrade times ex- 
hibit a crossover from a lower scaling exponent at small time 
scales (up to a trading day) to much higher exponent at large 
time scales (from a trading day to many months), a behavior 
consistent for all companies on the market ll69l[79tl . However, 
crossovers may also be a result of various types of nonsta- 
tionarities and artifacts present in the output signals, which, 
if not carefully investigated, may lead to incorrect interpreta- 



tion and modeling of the underlying mechanisms regulating 
the dynamics of a given system ||44|1 . 

In previous studies, we have systematically investigated 
the effects of various types of nonstationarities, data pre- 
processing filters and data artifacts on the scaling behavior 
of long-range power-law correlated signals as measured by 
the DFA method lf82l - l84ll . In particular, we studied a type of 
nonstationarity which is caused by the presence of disconti- 
nuities (gaps) in the signal, i.e., how randomly removing data 
segments of fixed length affects the scaling properties of long- 
range power-law correlated signals |83]. Such discontinuities 
may arise from the nature of the recordings — e.g., stock ex- 
change data are not recorded during the nights, weekends and 
holidays ll66l - i73ll . In these situations, discontinuities corre- 
spond to segments of fixed size. 

Alternatively, discontinuities may be caused by the fact 
that (i) part of the data is lost due to various reasons, and/or 
(ii) some noisy and unreliable portions of continuous record- 
ings (e.g., rn easurement artifacts) are discarded prior to anal- 
ysis ll27l - [39[ I45I l46ll . In these cases, the lengths of the lost or 
removed data segments are random, and may follow a certain 
type of distribution which can often be related to the process 
responsible for the removal or loss of data — e.g., a data ac- 
quisition device which fails randomly with a given probability 
p will result in a geometric distribution P{1) = (1 — p)'p with 
mean i-i = 1 /p, where I is the length of the data lost segments. 
Thus, investigating the effect of data loss is essential to deter- 
mine the true correlation properties of the signal output of a 
given system. 

To address this question, we propose a new segmentation 
algorithm to generate surrogate signals by randomly remov- 
ing data segments from long-range power-law correlated sig- 
nals with a-priori known scaling properties, and we investi- 
gate the effects of the percentage of the removed data, dif- 
ferent average lengths and different distributions of removed 
data segments. We compare the scaling behavior of the orig- 
inal signals with the scaling of the surrogate signals by sys- 
tematically studying changes in the DFA scaling exponent. 
We utilize local scaling exponents to reveal subtle deviations 
and to characterize changes in the scaling behavior at differ- 
ent scales in signals with segment removed. We note, that in 
our investigation we consider the effect of data loss on signals 
where the scaling behavior remains constant for the duration 
of the observations. Signals comprised of segments charac- 
terized by different scaling exponents have been considered 
elsewhere llssll . 

This paper is structured as follows: in Sec. Ill Al we briefly 
describe the DFA method. In Sec. IIIBI we describe how to 
generate stationary long-range power-law correlated signals. 
In Sec. Ill CI we introduce an algorithm for randomly remov- 
ing data segments from these signals to test the effects of data 
loss on the scaling behavior. In Sec. IIII Al we study the ef- 
fect of data loss on the global scaling of positively correlated 
and anti-correlated artifically generated signals with different 
length, and we show examples on two different sets of em- 
pirical data. In Sec. IIII Bl we compare the local scaling prop- 
erties of correlated signals before and after data removal by 
considering the effect of several parameters of the removed 
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segments. In Sec. IIII CI we consider the inverse situation — 
instead of focusing on the properties of the removed segments 
we investigate how the correlations/scaUng of the signal de- 
pend on the properties of the remaining data segments. We 
summarize and discuss our findings in Sec.lIVI 



II. METHODS 

A. Detrended fluctuation analysis (DFA) 

The DFA is a random walk based method ifTlll . It is an im- 
provement of the classical fluctuation analysis (FA) for non- 
stationary signals where embedded polynomial trends mask 
the intrinsic correlation properties in the fluctuations ifTTIl . The 
performance of DFA for signals with different types of non- 
stationarities and artifacts has been extensively studied and 
compared to other methods of correlation analysis lfl2[ [82l - 
[ssll . The DFA methods involves the following steps ill ill : 

(i) A given signal u{i) (i = 1, .., N, where N is the length 
of the signal) is integrated to obtain the random walk profile 
y{k) = X]i=i ['^(*) ~ where (u) is the mean of u{i). 

(ii) The integrated signal y{k) is divided into boxes of equal 
length n. 

(iii) In each box of length 71 we fit y(fc) using a polynomial 
function of order £ which represents the trend in that box. The 
y coordinate of the fit curve in each box is denoted by j/n(fc). 
When a polynomial fit of order (. is used, we denote the algo- 
rithm as DFA-£. Note that, due to the integration procedure in 
step (i), DFA-£ removes polynomial trends of order ^ — 1 in 
the original signal u{i). 

(iv) The integrated profile y{k) is detrended by subtracting 
the local trend y„ (fc) in each box of length n: 



Y{k)= y{k)~y„{k). 



(1) 



(v) For a given box length n, the root-mean-square (rms) 
fluctuation function for this integrated and detrended signal is 
calculated: 



F{n} 



\ 



1 ^ 



(2) 



k=l 



(vi) The above computation is repeated for a broad range 
of box lengths (where n represents a specific space or time 
scale) to provide a relationship between F{n) and n. 

A power-law relation between the root-mean-square fluc- 
tuation function F{n) and the box size n, i.e., F{n) ^ n", 
indicates the presence of scaling-invariant behavior embed- 
ded in the fluctuations of the signal u{i). The fluctuations 
can be characterized by a scaling exponent a, a self-similarity 
parameter which represents the long-range power-law corre- 
lation properties of the signal. If a = 0.5, there is no correla- 
tion and the signal is uncorrected (white noise); if a < 0.5, 
the signal is anti-correlated; if a > 0.5, the signal is posi- 
tively correlated; and a = 1.5 indicates Brownian motion (in- 
tegrated white noise). For stationary signals with long-range 
power-law correlations, the value of the scaling exponent a 



is related to the exponent /? characterizing the power spec- 
trum S{f ) ~ f^^ of the signal, where (3 ~ 2a — 1 UM . 
Thus, the special case of 1// noise, where /3 = 1, observed 
in various physiological and biological system dynamics, cor- 
responding to a = 1. Since the power spectrum of stationary 
signals is the Fourier transform of the auto-correlation func- 
tion, for signals with scale-invariant long-range positive cor- 
relation and a < 1, one can find the following relationship 
between the auto-correlation exponent 7 and the power spec- 
trum exponent /3 for signals with scale-invariant long-range 
correlations: 7 = 1- /3 = 2 — 2a, where 7 is defined by 
the auto-correlation function C(t) = t^'^, and should satisfy 

< 7 < 1 m. 

We note that for anti-correlated signals, the scaling expo- 
nent a obtained from the DFA method overestimates the true 
correlations at small scales n ||82|| . To avoid this problem, one 
needs first to integrate the original anti-correlated signal and 
then apply the DFA method. The correct scaling exponent can 
thus be obtained from the relation between n and F{n) / n [in- 
stead of F{n)] (see Fig.|4^). This procedure is applied for all 
cases of anti-correlated signals in this study. In our analysis in 
the following sections we apply DFA-2. The choice of DFA-2 
is dictated by the fact that this order of DFA-? can accurately 
quantify the scaling behavior of signals with exponents in the 
range < a < 3 lissll . which covers practically all signals 
generated by real world systems. Moreover, earlier investiga- 
tions have demonstrated that DFA-2 is sufficient to accurately 
quantify a broad range of nonstationary signals generated by 
different physiologic dynamics — e.g., for heartbeat and gait 
dynamics the exponent a obtained from higher order DFA-/ is 
not significantly different compared to a obtained from DFA- 
2 ll49ll . Further, deviations from scaling which appear at small 
scale become more pronounced in higher order DFA-Z lisoll . 
In order to provide an accurate estimate of F{n), the largest 
box size n we use is n = N/ 8, where N is the signal length. 



B. Procedure to generate stationary signals with long-range 
power-law correlations 

We use a modified Fourier filtering technique ll90ll to gen- 
erate stationary long-range power- law correlated signals u{i) 
(i = 1, 2, N) with mean = and standard deviation 

(T = 1. The correlations of u{i) are characterized by a Fourier 
power spectrum of a power-law form S{f) ^ f~^, where / is 
the frequency. By manipulating the Fourier spectrum of ran- 
dom Gaussian-distributed sequences, we generate signal u{i) 
with desired power-law correlations. This method consists of 
the following steps: 

(i) First, we generate a Gaussian-distributed sequence r]{i) 
with mean = and standard deviation = 1, and we 
calculate its Fourier transformation ?)(/). 

(ii) Next, we generate using the following transforma- 
tion: 



(3) 



where u{f) is the Fourier transform of the desired correlated 
signal u{i) characterized by a Fourier power spectrum of the 
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predetermined amount pN of data to be removed, i.e.. 




FIG. 1: Illustration of generating a surrogate signal u{i) by remov- 
ing data points from the original signal u{i) according to a binary 
series g{i). The positions i where g{i) — (or 1) correspond to the 
positions at which data points in u{i) are removed (or preserved) to 
obtain u(i). 



form 



S{f)^\uif)f^f 



(4) 



(iii) We calculate the inverse Fourier transform of to 
obtain The generated stationary signal u(i) is then nor- 
maUzed to zero mean and unit standard deviation. 



Algorithm to generate surrogate signals with randomly 
removed segments 



M 



(5) 



where M is the minimal number to fulfill Eq. |5] Eventually, 
we will cut the size of the last segment to obtain the exact 
fraction pN of the lost data. 

(ii) We append a "1" to each element in the series {Ij} 
which will serve as a separator between two adjacent seg- 
ments (see step (iv)), and results in a new series { [Ij, 1]}. Note 
that now the summation over the series yields pN + M. 

(iii) We append N — {pN + M) "1" elements to the end 
of the series {[Ij, 1]} to obtain an extended series where the 
sum of all elements is N, equal to the length of the original 
series u{i). This extended series is then shuffled leading to 
a set of M elements [Ij, 1] randomly scattered in a "sea" of 
N - {pN + M) "1" elements (see Eq.©. 

(iv) Next, we replace the numbers Ij in Eq. |6] with Ij el- 
ements of zeros, to obtain a binary series g{i) as shown in 
Eq.El 

{...,1, [h+2M 1,-..} 

(6) 




We introduce a new segmentation approach to generate sur- 
rogate nonstationary signals u{i) by randomly removing data 
segments from a stationary correlated signal u{i) and stitching 
together the remaining parts of u{i). Such "cutting" procedure 
is often used in the pre-processing of data prior to analysis 
in order to eliminate, for example, segments of data artifacts. 
The proposed segmentation approach allows the simulation of 
empirical data series where data segments are lost or removed. 
The surrogate signals u{i) are characterized by four parame- 
ters: (i) the DFA scaling exponent a of the original signal 
u{i), (ii) the percentage p of the data removed, (iii) the aver- 
age length ^ of the removed data segments as well as (iv) the 
functional form P{1) of the distribution of the length I of the 
removed data segments. 

To generate a surrogate signal u{i) from the original sig- 
nal u{i), we first construct a binary sequence g{i) with the 
same length N as u{i). In our algorithm the positions i where 
g{i) = will correspond to the positions at which data points 
in u{i) are removed, while the positions where g{i) = 1 will 
correspond to the positions in u{i) where data points are pre- 
served (Fig.[Tl)- 

We developed the following method to construct the binary 
series g[i): 

(i) We generate the lengths Ij (j ~ 1,2,..., M) of the seg- 
ments that will be removed from the original signal u{i) by 
randomly drawing integer numbers from a given probability 
distribution P{1) with mean value i^i. Each integer number 
drawn from P{1) represents the length of a segment removed 
from u{i). The process continues until the summation of the 
lengths of all removed segments becomes equal or exceeds a 



Note that, in step (iii) of our algorithm, the shuffling of 
the extended series may lead to two or more [Ij , 1] elements, 
which represent removed data segments, to become direct 
neighbors (Eq. |6]l. Adding "1" to each element {Ij} in step 
(ii) thus ensures that adjacent [Ij, 1] elements in the shuffled 
extended series in Eq.|6] would not allow two or more separate 
removed segments to be merged leading to the formation of 
removed segments with longer average length yu and different 
form of their probability distribution compared to the original 
choice in step (i) of the algorithm. 

Finally, the surrogate signal u{i) is obtained by simultane- 
ously scanning the original signal u{i) and the binary series 
g{i) from Eq.|2l removing the i-th element in u{i) if = 
and concatenating the segments of the remaining data (Fig.[T]). 

In this study, we consider four different functional forms 
of the probability distribution P{1) of segment lengths I, i.e., 
exponential, Gaussian, S- and power-law distributions, and 
we use the average length /i of the removed data segments 
as a common parameter to compare the effect of removed 
data segments with different distributions. For the exponen- 
tial and (5-distribution, the average length /i is sufficient to de- 
termine their probability distribution functions. The Gaussian 
and power-law distributions require additional parameters to 
be clearly defined, and thus, we need to introduce boundary 
conditions, so that these parameters can be related to the aver- 
age length /i. 

The functional form of the Gaussian distribution is 
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FIG. 2: Examples of theoretical probability density for (a) Gaus- 
sian distribution and (b) power-law distribution used in our simu- 
lations of different situations of data loss. The parameters for the 
functional form of distributions are determined by the average length 
fi we chose for each simulation and by specific boundary condi- 
tions, i.e., for the Gaussian distribution, we set the probability of 
the smallest segment length P{1 = 1) = 1/pN, and for the power- 
law distribution we set the probability of the largest segment length 
P{1 ~ Imax) = 1/pN (see text for details). 
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where /i is the average and a is the standard deviation of the 
segment lengths I. Since with a fixed small a, the Gaussian 
distribution is not much different from a (5-distribution, and 
with a fixed large a, the Gaussian distribution resembles an 
exponential distribution, we relate a with fi in such a way, as 
a boundary condition, that the smallest segment (l = 1) can 
only be obtained (statistically) once in each realization, i.e., 
P{1 = 1) = 1/pN, where N is the length of the original 
signal, and p is the percentage of data loss. 

The functional form of a power-law distribution is given by 

P{l)^al\le[l,lm.a.l (9) 

with J^''""^ PiVjdl = 1 and the average length /i = 
j-^'max ip^i^fii^ Similar to the Gaussian distribution, we set the 
probability of the largest segment to P{1 — Imax) = 1/pN . 
With these three boundary conditions, we can relate the three 
parameters a, k and Imax in Eq.|9]with the average length /i. 

In Fig. |2] we show examples of Gaussian and power-law 
distributions with different average lengths /i based on the cri- 
teria described above. Fig.[3]shows examples of our procedure 
of data removal. The lengths of the removed segments were 
chosen to be exponentially distributed with different average 
length. 

III. RESULTS 
A. Effect of data loss on global scaling 

Previously, we have studied the effect of data loss on the 
scaling behavior of long-range correlated signals by remov- 
ing data segments with fixed length [83]. We have found that 
data loss in anti-correlated signals substantially changes the 
scaling behavior even when only 1 % of data are removed. In 
contrast, the scaling behavior of (positively) correlated sig- 
nals is practically not affected even when up to 50% of the 
data are removed. Data loss generally causes a crossover in 



FIG. 3: Illustration of data removal from stationary correlated sig- 
nals. Removed data segments (shaded regions) are randomly posi- 
tioned within the original signal, and their lengths I are drawn from 
an exponential distribution P(l) = iexp(— with average 
value An average length = 10 is chosen for (a) the anti- 
correlated signal (DFA scaling exponent a = 0.3) and (b) the posi- 
tively correlated signal (a = 1.3). Larger segments with = 50 are 
removed from (b) anti-correlated signal (a — 0.3) and (d) positively 
correlated signal (a = 1.3). 

the scaling behavior of anti-correlated signals. At the scales 
larger than the crossover the anti-coiTelated scaling behavior 
is completely destroyed and resembles uncorrelated behavior 
This crossover is shifted to smaller scales with increasing per- 
centage of removed data or decreasing length of the removed 
segments, indicating a stronger effect on the scaling behavior 

In most cases, the length of data loss segments is not fixed 
but random, and follows a certain distribution. How does the 
distribution of data loss segments influence the scaling behav- 
ior of correlated signals? In some cases, especially when ar- 
chaeological data are studied, the percentage of data loss can 
be extremely large (and can reach up to 95% ! loill ). Would 
the extreme data loss affect also positively correlated signals? 
To address these questions, in this section we study the effect 
of data loss caused by random removal of data segments that 
follow a certain distribution. 

First, we consider the case in which the lengths of data 
loss segments are exponentially distributed. Following the 
approach introduced in Sec. Ill CI we first generate station- 
ary correlated signals u{i) with length = 2^° and with 
scaling exponents a ranging from 0.1 to 1.5, and then ran- 
domly remove exponentially distributed data segments from 
the original signal u{i) to obtain surrogate signals u{i). As 
illustrated in Fig.|4l the rms fluctuation function F{n) shows 
similar changes in the scaling behavior as observed in ifssll 
where segments with a fixed length were removed from the 
original signal, (i) The scaling behavior of surrogate signals 
strongly depends on the scaling exponent a of the original sig- 
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FIG. 4: (Color online) Effect of data loss on the scaling behavior of 
long-range correlated signals with length A'^ = 2^*^ (before data re- 
moval), zero mean and unity standard deviation. The lengths of the 
removed segments are drawn from an exponential distribution with 
mean jj, — 10. (a) Scaling behavior of anti-correlated signals (scaling 
exponent a < 0.5) with a data loss of 10% (blue circles), 65% (red 
triangles) and 90% (green squares). Note that, to obtain an accurate 
estimation of the DFA scaling exponent a for anti-correlated signals, 
we first integrate the signals and then we apply the DFA method. 
Thus, to obtain the correct scaling exponent for anti-correlated sig- 
nals we divide F{n) by n to account for the integration of the sig- 
nals and next we plot F{n)/n vs. the scale n (see also Sec. Ill Al and 
Fig. 15 in |82]). (b) Scaling behavior of positively correlated sig- 
nals (scaling exponent a > 0.5) with 10%, 65% and 90% data loss. 
The scaling behavior of strongly anti-correlated data is dramatically 
changed even when only 10% of the data are removed. A crossover 
at scale indicates a transition (arrow), due to loss of data in the 
signals, from the original anti-correlated behavior with a = 0.1 to an 
uncorrelated behavior with a = 0.5. In contrast, for positively cor- 
related signals, i.e. 0.5 < q < 1.5 only an extreme data loss of 90% 
leads to small deviations from the original scaling behavior This ef- 
fect becomes weaker for increasing values of a . As expected, for 
a = 0.5 (white noise) and q = 1.5 (Brownian noise) data removal 
does not affect the scaling behavior 



nals. (ii) The anti-correlated signals substantially change their 
scaling behavior even if only 10% of the data are removed 
(Fig. SJa)). A crossover from anti-correlated to uncorrelated 
(a = 0.5) behavior appears at scale Ux due to data loss, i.e., 
at the scales larger than ji^, the anti-correlations in the origi- 
nal signals are completely destroyed. The crossover scale 
is shifted to smaller scales with increasing percentage of lost 
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FIG. 5: (Color online) Effect of data loss on the scaling behavior of 
short signals (A'^ = 4000 before data removal), (a) Removing up 
to 50% of the data (i.e., 2000 data points remain) does not have an 
observable effect on the scaling behavior of positively correlated sig- 
nals and leads to small deviations from the original scaling behavior 
in anti-correlated signals, (b) Extreme data loss of 90% (i.e., only 
400 data points remain) leads to more pronounced deviations from 
the original scaling behavior In general, the deviations are smaller 
with larger average length [i, of removed segments. 



data, (iii) In contrast, positively correlated signals show prac- 
tically no changes for up to 65% of data loss (Fig.|4|b)). Sur- 
prisingly, even with extreme data loss of up to 90% of the 
signal the scaling behavior is still practically preserved, ex- 
hibiting a slightly lower exponent a (waker correlations) — 
an effect which is less pronounced with increasing values of 
a (see Fig. Hb)). 

Next, we consider the case in which the length of the origi- 
nal signal is much shorter {N = 4000), as illustrated in Fig.|5] 
We find that the scaling behavior of both anti-correlated and 
positively correlated signals with extreme data loss change in 
the same way as we observed in Fig. |4] (where = 2^"). In 
addition, we find (see Fig.|5]l that when increasing the average 
length /i of the data loss segments, the scaling behavior of the 
suiTogate signals deviates less from the original scaling be- 
havior. Thus, removing the same percentage of the data using 
longer (and fewer) segments has a lesser impact on the scal- 
ing behavior of both positively correlated and anti-correlated 
signals compared to removing segments with smaller average 
length /i. 
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FIG. 6: (Color online) Two examples of the effect of extreme data 
loss: (a) interstride intervals of human gait, and (b) annual prices of 
pepper in England in the period 1209-1914. Removing up to 90% 
of the gait intervals and up to 75% of the commodity data using seg- 
ments of different average length /i does not significantly affect the 
global scaling behavior. Closed symbols represent a single realiza- 
tion and open symbols indicate the mean and standard deviations ob- 
tained from 100 realizations of randomly removing data segments. 
The lengths of the removed data segments are drawn from an expo- 
nential distribution. 



To show how missing data segments affect correlations in 
real-world signals, we consider two examples of complex 
scale-invariant dynamics; (i) human gait as a representative 
of integrated physiologic systems under neural control with 
multiple-component feedback interactions (Fig. |6h), and (ii) 
commodity price fluctuations from England across several 
centuries reflecting complex economic and social interactions 
(Fig. I6J5). In agreement with our tests on surrogate signals 
shown in Fig. |4] and Fig. |5] our analyses of real data confirm 
the observation that even extreme data loss of up to 90% does 
not significantly affect the global scaling behavior of posi- 
tively correlated (a > 0.5) signals. 



B. Properties of removed data segments: Effect of data loss on 
local scaling 

To reveal in greater detail the effect of data loss, we inves- 
tigate the local scaling behavior of the F{n) curves by fitting 



F{n) locally in a window of size w = 3log2. We determine 
the local scaling exponent aioc at different scales n by mov- 
ing the window w in small steps of size A = \log2 starting 
at n = 4. 

InFig.H we show OLioc 

for 10%, 65% and 90% of data loss, 
and the average length of the data loss segments is = 10 (cp. 
Fig- SI- The scaling behavior of anti-correlated signals shows 
systematic deviations from the original behavior: the stronger 
the anti-correlations, the faster is the decay of aioc towards 
0.5 (uncorrelated behavior). The deviations are stronger when 
more data were removed from the original signal. Note that 
when 90% of the data are removed, the coiTelation properties 
of originally anti-correlated signals are completely destroyed 
(Fig. Etc)), because there are practically no consecutive data 
points of the original signals preserved in the suiTogates when 
^ = 10 and p = 90% (see Sec. Imcl and Eq. (El- When 
increasing the average length of the removed segments from 
/i = 10 to /^t = 100 (Fig. |7ll, the scaling behavior of anti- 
correlated signals is less affected and aioc = 0.5 is reached at 
larger scales. 

For positively correlated signals (0.5 < a < 1.5), the 
effect of data loss is more complex. The local scaling ex- 
ponents show significant and systematic deviations from the 
original scaling behavior not observed in the rms fluctuation 
functions F{n) in Fig. IHb). The deviations from the origi- 
nal scaling behavior are more pronounced for a higher per- 
centage of data loss and vary across scales. For small aver- 
age length {f^L = 10, Fig. |7h-c), the local scaling exponent is 
underestimated at small scales and gradually recovers to the 
original scaling behavior at larger scales. For a larger aver- 
age length of removal data segments (/i = 100, Fig.|7}l-f), we 
find overestimated regions at small scales and underestimated 
regions at large scales. The overestimation of the local scal- 
ing behavior is more pronounced for stronger positively cor- 
related signals, while the underestimation is more pronounced 
for weaker positively correlated signals. 

An interesting phenomenon seen in Fig. [7] is that for anti- 
correlated signals the scale at which aioc reaches 0.5 (uncorre- 
lated behavior) is shifted towards smaller scales with increas- 
ing percentage of data loss. Similarly, for positively corre- 
lated signals, the overestimated and underestimated regions 
are also shifted towards smaller scales, when a higher per- 
centage of data is removed. This phenomenon occurs in both 
cases /i = 10 and fi = 100. 

To understand precisely how the two parameters — the av- 
erage length ji of the data loss segments and the percentage 
p of data loss — influence changes in the local scaling be- 
havior, in Fig. [Sh-d we show how aioc changes with the av- 
erage length yU of the removed segments. For anti-coiTelated 
signals, the scale at which aioc reaches 0.5 monotonically in- 
creases and shows a power-law relationship with /^t (Fig. [8^). 
For positively correlated signals, as shown in Fig. [SJ^-d, the 
overestimated regions at small scales as well as the underesti- 
mated regions at large scales are shifted to higher scales with 
increasing /i. This shift in the local scaling behavior also fol- 
lows a power-law with average length /i (Fig. [8};, inset). 

In Fig.[8^-h, we show how the percentage p of data loss in- 
fluence changes in the local scaling behavior For a fixed aver- 
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FIG. 7: (Color online) Effect of data loss on the local scaling behavior (quantified by local scaling exponent aioc) of long-range power-law 
correlated signals. The symbols indicate average aioc values obtained from 100 different realizations of surrogate signals with the same 
correlation exponent a, and the error bars show the standard deviations. The more data are removed, the more the scaling exponent deviates 
from the original exponent. The data loss segments are exponentially distributed with average length /i = 10 ((a)-(c)) and /i — 100 ((d)-(f)). 
For anti-correlated signals, the removal of larger segments (fi = 100) has less effect on the scaling behavior. For positively correlated signals, 
the deviations vary across scales, showing both overestimated and underestimated regions. 
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FIG. 8: (Color online) Effect of the average length ^ of data loss segments (a)-(d) and effect of the percentage p of data loss (e)-(h) on the 
local scaling behavior in anti-correlated signals [(a), (e): a = 0.3] and positively correlated signals [(b), (f): a — 0.7; (c), (g): a = 1.0; (d), 
(h): a — 1.3]. For (a)-(d), p — 90% of data are removed, and for (e)-(h), the average length of removed segments ^ = 100. In all the cases, 
the removed segments are exponentially distributed, and the length of the original signals TV = 2^^'. To clearly see the power-law relation 
between the average length /i of removed segments and the scale n at which aioc achieves the same value, the aioc values are projected into 
the logio M~logio n plane (see color-coded insets in figures (a)-(d)). The symbols in the inset figures in (c) and (g) indicate the positions where 
aioc values reach a maximum (red closed circle) or a minimum (blue open circle), and depict the shift of the overestimated and underestimated 
regions to large scales with increasing fi and decreasing p. The local scaling curves highlighted by black symbols correspond to the curves 
shown in Fig. |7] (rectangle: /i = 10, p = 90%; diamond: fi = 100, p = 90%; circle: fi = 100, p = 65%; triangle: fi = 100; p = 10%). 
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FIG. 9: (Color online) Effect of different kinds of distributions of 
data loss segments on the local scaling behavior. The power-law dis- 
tributed data loss segments lead to higher values of aioc for pos- 
itively correlated signals and lower values for anti-correlated sig- 
nals compared to the other distributions. There is no difference 
between Gaussian and (5-distributed segments which yield slightly 
lower aioc values than exponentially distributed signals. For anti- 
correlated signals, exponentially, Gaussian and 5-distributed seg- 
ments lead to identical aioc values whereas the power-law distribu- 
tion yields slightly lower local scaling exponents. 



age length fi = 100, we find that the deviation from the orig- 
inal scaling behavior is more pronounced for higher values of 
p in both anti-cotTelated and positively correlated signals, as 
also observed in Fig. |2l The scaling behavior of positively 
correlated signals also shows overestimated regions at small 
scales and underestimated regions at large scales (Fig. [Sf-h), 
although not as clear as in Fig.[8j3-d. Both regions are shifted 
to larger scales with decreasing percentage of data loss as il- 
lustrated in the inset in Fig. [8^. 

To understand whether different functional forms of dis- 
tributions of data loss segments have different effects on the 
scaling behavior, we repeated the same tests with three other 
kinds of distributions; a Gaussian distribution, a (5-distribution 
(i.e., segments have fixed length) and a power-law distribu- 
tion. We find that all three kinds of distributions show simi- 
lar deviations from the original local scaling behavior as re- 
ported above for exponentially distributed data loss segments. 
However, for power-law distributed segments lengths, the es- 
timated local scaling exponents are generally higher (lower) 
across scales for positively (anti-) cotTelated signals (Fig. 
When increasing the average length /j. of the removed data 
segments or increasing the percentage p of data loss, the 
power-law distribution shows less variations than the other 
three kinds of distributions (Fig. [TO] and Fig.fTTTi. 
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FIG. 10: (Color online) Effect of the average length fi of data loss 
segments on the local scaling behavior in long-range correlated sig- 
nal with a = 1.0. The length of the data loss segments are (a) expo- 
nentially distributed, (b) Gaussian distributed, (c) (5-distributed and 
(d) power-law distributed. In all the cases, p = 90% of data are 
removed, and the length of the original signals TV = 2'^". The be- 
havior of how aioc changes with jj, is similar for exponential, Gaus- 
sian and (5-distribution, while the power-law distribution shows less 
variations. The local scaling curves highlighted by black symbols 
correspond to the curves shown in Fig.|9] 
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FIG. 1 1 : (Color online) Effect of the percentage p of data loss on the 
local scaling behavior in long-range correlated signal with a = 1.0. 
The length of the data loss segments are (a) exponentially distributed, 
(b) Gaussian distributed, (c) 5-distributed and (d) power-law dis- 
tributed. In all the cases, the average length of removed segments 
H = 100, and the length of the original signals TV — 2^". Similar to 
Fig. [To] the exponential, Gaussian and ^-distributions show similar 
changes in aioc with p, while the power-law distribution shows less 
variations. The local scaling curves highlighted by black symbols 
correspond to the curves shown in Fig.|9] 



C. Properties of remaining data segments: Effect of data loss 
on local scaling 

In the previous section, we tested the effect of data loss 
by specifying the distribution and average length of removed 
segments. In this section, we study the effect of data loss by 
specifying the distribution and average length of remaining 
data segments. The results obtained by focusing on the prop- 
erties of remaining data segments are different from what was 
shown above and will lead to a better understanding of the 
effect of data loss on the scaling behavior of long-range cor- 
related signals. 

The approach to generate the appropriate surrogate signals 
with different properties of remaining data segments is simi- 
lar to the one described in Sec. Ill CI except that now the bi- 
nary series g{i) are obtained according to the parameters of 
the remaining data segments, and the surrogate signals u{i) 
are generated by removing the i-th data point in the original 
signal u{i) if g{i) = 1, and preserving the i-th data point if 
g{i) = 0. The relation between the average length of data 
loss segments {pi) and remaining data segments {^r) can be 
derived as follows: 

Let the length of the original signal be N . If pi is the 
percentage of data loss, the amount of data loss is given by 
Ni ~ piN, and the amount of remaining data is given by 
Nr — PrN = (1 — pi)N . If /i; is the average length of the 
lost data segments, the number of lost segments is approxi- 
mately given by n/ w Ni/ fii. The number of remaining data 
segments is approximately equal to the number of data loss 
segments, i.e., ji^ ~ n;. Hence, the average length of the 
remaining data segments is: 



Note that the lengths of data loss segments are always geo- 
metrically distributed due to the shuffling procedure in our 
segmentation approach (see Sec. lIICI and Fig.[T2b. 

We find similar changes in the scaling behavior as observed 
in Fig. [7] where the distribution of removed segment lengths 
was specified. As illustrated in Fig. [Ts] where the lengths of 
remaining segments are exponentially distributed, the local 
scaling behavior of anti-correlated surrogate signals deviate 
monotonically from original behavior towards uncorrelation 
at larger scales. While the local scaling exponents of posi- 
tively coiTelated suiTogate signals vary across scales, show- 
ing both overestimated and underestimated regions. These re- 
gions as well as the scales at which the anti-correlated signals 
reach a/oc = 0.5 are also shifted towards larger scales when 
the average length of remaining segments /i^ increases. How- 
ever, in contrast to what was observed in Fig. |2l there is no 
shift to smaller scales with increasing percentage of data loss. 
Note that, according to Eq.[Tol an average length ~ 10 of 
remaining segments and a percentage p,. = 10% of remaining 
data (as shown in Fig.fTSb). coiTesponds to an average length 
/i; = 90 of removed segments and a percentage pi = 90% 
of removed data. Thus the local scaling behavior observed 
in Fig. [T3b is vary similar to Fig. (where fii = 100 and 
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FIG. 12: The distributions of remaining data segments (left column) 
and corresponding distributions of data loss segments (right column). 
The remaining data segments follow (a) exponential, (b) power-law, 
(c) Gaussian, and (d) (5-distribution with average length fir=lOO and 
35% of data remaining. The data loss segments are always geomet- 
rically distributed independent of the distributions of remaining seg- 
ments. Note that, the average lengths are practically the same as 
estimated from Eg. 1101 



pi = 90%), and Fig. [HJ (/i^ = 100, Pr ^ 90%, fii = 11) is 

similar to Fig.|7}i (Mi = 10, = 10%). 

In Fig. [T4b-d. we show how the local scaling behavior 
changes with the average length fir of remaining segments. 
Similar to Fig. where the distribution of removed seg- 
ments was specified, the variation of the local scaling behav- 
ior of positively correlated signals also shows overestimated 
regions at smaller scales followed by underestimated regions 
at larger scales. Both regions are shifted to larger scales, when 
the average length of remaining segments increases, forming 
a power-law relationship between the shift in the local scal- 
ing behavior and fir (Fig. [14]:). For anti-correlated signals 
the local scaling behavior also shows a power-law relation- 
ship between the scale at which aioc reaches 0.5 and the aver- 



age length fir- Note that, according to Eq.[TOl the a/oc curves 
from /ir=8 to 455 in Fig.fT4h-d correspond to /i;=72 to 4095 in 
Fig.[8^-d, thus the local scaling behavior in these two regions 
are very similar. 

With increasing percentage pr of remaining data, the de- 
viation from the original scaling behavior becomes smaller 
(Fig.[T4b-h). However, for anti-correlated signals, the scale at 
which aioc reaches 0.5 does not depend on the percentage of 
data loss (Fig.[T4b). in contrast to Fig. [8^ where removed data 
segments were studied. Similarly, the overestimated regions 
in positively correlated signals are also not shifted with the 
percentage of data loss (Fig.[T4f-h, and compare to Fig.[8f-h). 

Next, we investigate how different kinds of distributions of 
remaining data segments influence the local scaling behavior 
As illustrate in Fig. [15] the surrogate signals generated by 
using Gaussian or (5-distribution have almost identical local 
scaling behavior and the most pronounced deviation from the 
original local scaling behavior, and the power-law distribution 
shows the smallest deviations. Note that, the local scaling ex- 
ponent of surrogate signals generated by a (5-distribution jump 
to larger aioc values at certain small scales when the scaling 
exponent of the original signal is 1.3, 1.4 and 1.5. This be- 
havior is caused by the discontinuities in the surrogate signal 
at the transition points between remaining data segments, and 
since the remaining segments are of fixed length, the transition 
points occur periodically. If the segment length (fi = 100 in 
Fig. [El is an integral multiple of the size of the fitting boxes 
(scales) in the DFA algorithm (e.g., n = 10, 20, 25, 50), the 
transition points are not included in any fitting box and thus 
the rms fluctuation functions of the surrogate signals will be 
the same as in the original signals. In all other cases, the dis- 
continuities inside the fitting box will cause larger rms fluctua- 
tion functions and lead to jumps in the local scaling exponents 
at certain scales n < /i^ as observed in Fig. [15] 

In Fig. [16] we show how the local scaling curves of posi- 
tively correlated signals change with the average length f^ir of 
remaining segments, which follow an exponential distribution 
(Fig.[T6b). a Gaussian distribution (Fig.[T6b). a (5-distribution 
(Fig.[T6b). and a power-law distribution (Fig.[T6tl). The Gaus- 
sian and (5-distributions lead to a similar local scaling behavior 
with regions of pronounced overestimation and underestima- 
tion which are shifted to larger scales for increasing values 
of fir- This shift is also observed in the case of the expo- 
nential distribution, however, the deviation from the original 
scaling behavior (overestimation/underestimation) is less pro- 
nounced. In contrast, the power-law distribution shows less 
variation of the local scaling behavior and does not lead to 
such distinct regions of over- and underestimated aioc values. 
In addition, the local scaling curves do not show a clear depen- 
dency ("shift") with the average length of remaining segments 

fir. 

The variation of the local scaling curves with the percentage 
Pr of remaining data for the four different distributions are 
presented in Fig. [IT] Similar as shown in Fig. [14] the scale of 
most pronounced deviation from the original scaling behavior 
is independent of the percentage pr of remaining data. 
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FIG. 13: (Color online) Effect of data loss on the local scaling behavior of long-range correlated signals. The lengths of the remaining data 
segments are exponentially distributed with average length /j,,- — 10 ((a)-(c)) and /i,. = 100 ((d)-(f)). The symbols indicate average aioc 
values obtained from 100 different realizations of surrogate signals with the same correlation exponent a, and the error bars show the standard 
deviations. The more data are removed, the more the scaling exponent deviates from the original exponent. For anti-correlated signals, the 
removal of larger segments (Hr = 100) has less effect on the scaling behavior. For positively correlated signals, the deviations vary across 
scales, showing both overestimated and underestimated regions. 
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FIG. 14: (Color online) Effect of the average length Hr of remaining data segments (a)-(d) and effect of the percentage pr of remaining data 
(e)-(h) on the local scaling behavior in anti-correlated signals [(a), (e): a — 0.3] and positively correlated signals [(b), (f): a — 0.7; (c), (g): 
a — 1.0; (d), (h): a = 1.3]. For (a)-(d), p,, — 10% of data are remained, and for (e)-(h), the average length of remaining segments fir ~ 100. 
In all the cases, the remaining segments are exponentially distributed, and the length of the original signals = 2^". The symbols in the 
inset figures in (c) and (g) indicate the positions where aioc values reach a maximum (red closed circle) and a minimum (blue open circle), 
which show that the overestimated and underestimated regions are shifted to larger scales only with increasing /ir and are not shifted with the 
percentage pr of remaining data changes. The local scaling curves highlighted by black symbols correspond to the curves shown in Fig.ll3l 
(rectangle: Hr — 10, Pr ~ 10%; diamond: fir = 100, p,- — 10%; circle: /ir ~ 100, pr ~ 35%; triangle: /ir ~ 100; pr ~ 90%). 
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FIG. 15: Effect of different kinds of distributions of remaining 
data segments on the local scaling behavior. The Gaussian and 5- 
distributions lead to identical and most pronounced deviations from 
the original scaling behavior for both anti-correlated and positively 
correlated signals. The power-law distribution leads to lowest de- 
viations for anti-correlated signals and a smoother behavior of aioc 
versus /i,., i.e., a less pronounced over- and underestimation of the 
original scaling behavior for positively correlated signals. Interest- 
ingly, for positively correlated signals, all four kinds of distributions 
yield the same local scaling exponent aioc at certain scale (n ~ 300 
for Hr = 100). Note that in case of the 5-distribution, large jumps 
of aioc values at small scales occur for original scaling exponents 
Of =1.3 to 1.5 (see text for more details). 



IV. SUMMARY AND CONCLUSION 

In this paper, we studied the effect of extreme data loss on 
the DFA scaling behavior of long-range power-law correlated 
signals. In order to simulate extreme data loss, often encoun- 
tered in archaeological and geological data, we developed 
a new segmentation approach to generate coiTelated signals 
with randomly removed data segments. Using this approach, 
surrogate signals can be generated for different percentages 
of data loss, different average lengths and different distribu- 
tions of removed/remaining data segments. We compared 
the difference between the DFA scaling behavior of original 
and surrogate signals by systematically changing the percent- 
age of data loss and the average length of removed/remaining 
segments, and we also consider different functional forms of 
the distributions of removed/remaining segment lengths. We 
studied changes in the global scaling behavior as well as in 
the local scaUng exponents to reveal subtle deviations across 
scales. 

We find that anti-correlated signals are very sensitive to data 
loss. Even if only 10% of the data are removed, the scaling be- 
havior of the surrogate signals changes dramatically, showing 
uncorrelated behavior at large scales. In contrast, positively 
correlated signals are more robust to data loss and no signif- 
icant changes in the global scaling behavior are observed for 



Original signal: a=1.0; N=2 
p =10% data remain _ 

r _ - - 




FIG. 16: Effect of different distributions and the average length 
of remaining data segments on the local scaling behavior. In all the 
cases, pr = 10% of data are remained, and the length of the origi- 
nal signals — 2^". The Gaussian and (5-distribution lead to very 
similar behavior with most pronounced aioc deviations and a clear 
shift with /ir . In contrast, the power-law distribution shows no clear 
dependency of aioc with /ir. The local scaling curves highlighted by 
black symbols correspond to the curves shown in Fig.ll5l 
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Original signal: a=1.0; N=2 

Remaining data segments:_^verage lengthn^=100 

(a) Exponential tjistribiition - ~ . ^ 




FIG. 17: Effect of different distributions of remaining data segments 
and the percentage pr of remaining data on the local scaling be- 
havior. In all the cases, the average length of remaining segments 
lir = 100, and the length of the original signals A'^ = 2^". The 
deviations from original scaling behavior are more pronounced for 
smaller percentages of remaining data. Note that the scale at which 
the most pronounced deviation is observed does not depend on pr. 
The local scaling curves highlighted by black symbols correspond to 
the curves shown in Fig. 1151 



up to 90% of data loss. However, in case of extreme data 
loss, we find significant and systematic deviations in the lo- 
cal scaling behavior which is overestimated at small scales 
and underestimated at large scales. Specifically, we find that 
for anti-correlated signals the scale at which the local scaling 
exponent aioc reaches 0.5 shifts to larger scales with increas- 
ing the average length (or jir) of the removed (or remain- 
ing) segments, following a power-law relationship with (or 
jir). For positively correlated signals the regions of overesti- 
mation and underestimation of the local scaling exponent are 
also shifted to larger scales following a power-law with in- 
creasing (or /Xr). 

As expected, increasing the percentage of data loss leads 
to more pronounced deviations in the local scaling behavior. 
However, the variation of local scaling curves follows differ- 
ent rules if the properties of either removed segments or re- 
maining segments are considered. When the average length 
jjLi of removed data segments is kept constant, for increas- 
ing percentage pi of removed data, the deviations of both 
anti-correlated and positively correlated signals are shifted to 
smaller scales following a power-law with pi. When we fo- 
cus on remaining data segments and keep their average length 
jjLr constant, the deviations become more pronounced with de- 
creasing percentage of remaining data, however, the devia- 
tions occur at the same scales. 

This behavior can be explained by the relationship between 
removed and remaining data. In case of a fixed percentage of 
removed or remaining data, /i; and are always directly pro- 
portional to each other (Eq. [TOb and therefore the deviations 
(and the shift of the most pronounced deviation) show a simi- 
lar power-law relation with /i/ and /i^, while fixing the average 
length of removed or remaining segments leads to two differ- 
ent scenarios: (i) fixing jU; and changing pi leads to changes 
in i^Lr proportional to pi; (ii) fixing /i^ and changing pr leads 
to changes in /i/ proportional to pr- Since the scale of the 
most pronounced deviation from the original scaling behav- 
ior is shifted for scenario (i) where jij. is changing and /i/ is 
fixed, but not scenario (ii) where /i/ is changing and /i,. is 
fixed, changes in yU; do not contribute to the observed shift. 
Thus, we suggest that /i,. is the key parameter to determine 
the scales at which the scaling behavior is mostly influenced, 
whereas the percentage of data loss determines the extent of 
this influence. 

Different distributions of the lengths of removed/remaining 
segments affect the local scaling behavior differently. For 
Gaussian and (5-distributed segment lengths, deviations are 
most pronounced and similar in extent, whereas power-law 
distributed segments show smallest deviations and a very dif- 
ferent overall behavior when compare to exponential, Gaus- 
sian and (5-distributed segments. 

In conclusion, our study shows that it is important to con- 
sider not only the percentage of data loss (removed/remaining 
data), but also the average length of remaining segments 
to identify the scales at which deviations from the original 
("real") DFA scaling behavior is most pronounced. Therefore, 
when studying the scaling properties of signals with extreme 
data loss, the DFA results should be carefully interpreted to 
reveal the real scaling behavior 
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